AITopics | augmented dataset

Collaborating Authors

augmented dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

f0f800c92d191d736c4411f3b3f8ef4a-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 20:46:42 GMT

dataset, mutual information, rationale selector, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > North Carolina (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning

Tan, Jing Jie, Mokraoui, Anissa, Kwan, Ban-Hoe, Ng, Danny Wee-Kiat, Hum, Yan-Chai

arXiv.org Artificial IntelligenceDec-10-2025

Abstract--Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-computer interaction. However, a recent challenge in this domain is dealing with low-resolution image (LRI). While performance can be improved by using larger models like transformers for encoding, these models are typically heavyweight, demanding significant computational resources and memory, leading to challenges in retraining. T o address this, the proposed SOLI (Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning) approach presents a solution specifically designed for lightweight, low-resolution images captioning. It employs a Siamese network architecture to optimize latent embeddings, enhancing the efficiency and accuracy of the image-to-text translation process.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.23919/SPA61993.2024.10715604

2512.08873

Genre: Research Report (1.00)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scaling to Multimodal and Multichannel Heart Sound Classification with Synthetic and Augmented Biosignals

Marocchi, Milan, Fynn, Matthew, Mandana, Kayapanda, Rong, Yue

arXiv.org Artificial IntelligenceDec-9-2025

Cardiovascular diseases (CVDs) are the leading cause of death worldwide, accounting for approximately 17.9 million deaths each year. Early detection is critical, creating a demand for accurate and inexpensive pre-screening methods. Deep learning has recently been applied to classify abnormal heart sounds indicative of CVDs using synchronised phonocardiogram (PCG) and electrocardiogram (ECG) signals, as well as multichannel PCG (mPCG). However, state-of-the-art architectures remain underutilised due to the limited availability of synchronised and multichannel datasets. Augmented datasets and pre-trained models provide a pathway to overcome these limitations, enabling transformer-based architectures to be trained effectively. This work combines traditional signal processing with denoising diffusion models, WaveGrad and DiffWave, to create an augmented dataset to fine-tune a Wav2Vec 2.0-based classifier on multimodal and multichannel heart sound datasets. The approach achieves state-of-the-art performance. On the Computing in Cardiology (CinC) 2016 dataset of single channel PCG, accuracy, unweighted average recall (UAR), sensitivity, specificity and Matthew's correlation coefficient (MCC) reach 92.48%, 93.05%, 93.63%, 92.48%, 94.93% and 0.8283, respectively. Using the synchronised PCG and ECG signals of the training-a dataset from CinC, 93.14%, 92.21%, 94.35%, 90.10%, 95.12% and 0.8380 are achieved for accuracy, UAR, sensitivity, specificity and MCC, respectively. Using a wearable vest dataset consisting of mPCG data, the model achieves 77.13% accuracy, 74.25% UAR, 86.47% sensitivity, 62.04% specificity, and 0.5082 MCC. These results demonstrate the effectiveness of transformer-based models for CVD detection when supported by augmented datasets, highlighting their potential to advance multimodal and multichannel heart sound classification.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.11606

Country:

Europe (0.46)
Asia > India (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Introducing A Bangla Sentence - Gloss Pair Dataset for Bangla Sign Language Translation and Research

Saha, Neelavro, Shahriyar, Rafi, Roudra, Nafis Ashraf, Sakib, Saadman, Rasel, Annajiat Alim

arXiv.org Artificial IntelligenceNov-12-2025

Bangla Sign Language (BdSL) translation represents a low-resource NLP task due to the lack of large-scale datasets that address sentence-level translation. Correspondingly, existing research in this field has been limited to word and alphabet level detection. In this work, we introduce Bangla-SGP, a novel parallel dataset consisting of 1,000 human-annotated sentence-gloss pairs which was augmented with around 3,000 synthetically generated pairs using syntactic and morphological rules through a rule-based Retrieval-Augmented Generation (RAG) pipeline. The gloss sequences of the spoken Bangla sentences are made up of individual glosses which are Bangla sign supported words and serve as an intermediate representation for a continuous sign. Our dataset consists of 1000 high quality Bangla sentences that are manually annotated into a gloss sequence by a professional signer. The augmentation process incorporates rule-based linguistic strategies and prompt engineering techniques that we have adopted by critically analyzing our human annotated sentence-gloss pairs and by working closely with our professional signer. Furthermore, we fine-tune several transformer-based models such as mBart50, Google mT5, GPT4.1-nano and evaluate their sentence-to-gloss translation performance using BLEU scores, based on these evaluation metrics we compare the model's gloss-translation consistency across our dataset and the RWTH-PHOENIX-2014T benchmark.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.08507

Country: Asia (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Education > Curriculum > Subject-Specific Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment

Urse, Adrian-Dinu, Cercel, Dumitru-Clementin, Pop, Florin

arXiv.org Artificial IntelligenceNov-4-2025

Abstract--Natural disaster assessment relies on accurate and rapid access to information, with social media emerging as a valuable real-time source. However, existing datasets suffer from class imbalance and limited samples, making effective model development a challenging task. This paper explores augmentation techniques to address these issues on the CrisisMMD multimodal dataset. For visual data, we apply diffusion-based methods, namely Real Guidance and DiffuseMix. For text data, we explore back-translation, paraphrasing with transformers, and image caption-based augmentation. We evaluated these across unimodal, multimodal, and multi-view learning setups. Results show that selected augmentations improve classification performance, particularly for underrepresented classes, while multi-view learning introduces potential but requires further refinement. This study highlights effective augmentation strategies for building more robust disaster assessment systems.

artificial intelligence, machine learning, social media, (19 more...)

arXiv.org Artificial Intelligence

2511.00004

Country:

North America (0.28)
Europe > Romania (0.15)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Difficulty-Controllable Cloze Question Distractor Generation

Kang, Seokhoon, Jeon, Yejin, Hwang, Seonjeong, Lee, Gary Geunbae

arXiv.org Artificial IntelligenceNov-4-2025

Multiple-choice cloze questions are commonly used to assess linguistic proficiency and comprehension. However, generating high-quality distractors remains challenging, as existing methods often lack adaptability and control over difficulty levels, and the absence of difficulty-annotated datasets further hinders progress. To address these issues, we propose a novel framework for generating distractors with controllable difficulty by leveraging both data augmentation and a multitask learning strategy. First, to create a high-quality, difficulty-annotated dataset, we introduce a two-way distractor generation process in order to produce diverse and plausible distractors. These candidates are subsequently refined through filtering and then categorized by difficulty using an ensemble QA system. Second, this newly created dataset is leveraged to train a difficulty-controllable generation model via multitask learning. The framework includes carefully designed auxiliary tasks that enhance the model's semantic understanding of distractors and its ability to estimate their difficulty. Experimental results demonstrate that our method generates high-quality distractors across difficulty levels and substantially outperforms GPT-4o in aligning distractor difficulty with human perception.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.01526

Country:

Europe (0.93)
Asia (0.67)
North America > Mexico (0.28)

Genre: Research Report > New Finding (0.88)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Hierarchical Dual-Head Model for Suicide Risk Assessment via MentalRoBERTa

Yang, Chang, Wang, Ziyi, Tan, Wangfeng, Tan, Zhiting, Ji, Changrui, Zhou, Zhiming

arXiv.org Artificial IntelligenceOct-24-2025

School of Artificial Intelligence Beijing University of Posts and T elecommunications Beijing, China ziyiwang2003@bupt.edu.cn Abstract--Social media platforms have become important sources for identifying suicide risk, but automated detection systems face multiple challenges including severe class imbalance, temporal complexity in posting patterns, and the dual nature of risk levels as both ordinal and categorical. This paper proposes a hierarchical dual-head neural network based on MentalRoBERT a for suicide risk classification into four levels: indicator, ideation, behavior, and attempt. The model employs two complementary prediction heads operating on a shared sequence representation: a CORAL (Consistent Rank Logits) head that preserves ordinal relationships between risk levels, and a standard classification head that enables flexible categorical distinctions. A 3-layer Transformer encoder with 8-head multi-head attention models temporal dependencies across post sequences, while explicit time interval embeddings capture posting behavior dynamics. The model is trained with a combined loss function (0.5 CORAL + 0.3 Cross-Entropy + 0.2 Focal Loss) that simultaneously addresses ordinal structure preservation, overconfidence reduction, and class imbalance. T o improve computational efficiency, we freeze the first 6 layers (50%) of MentalRoBERT a and employ mixed-precision training. The model is evaluated using 5-fold stratified cross-validation with macro F1 score as the primary metric.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.20085

Country: Asia > China > Beijing > Beijing (0.44)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Machine Learning Approach for MIDI to Guitar Tablature Conversion

Kaliakatsos-Papakostas, Maximos, Bastas, Gregoris, Makris, Dimos, Herremans, Dorien, Katsouros, Vassilis, Maragos, Petros

arXiv.org Artificial IntelligenceOct-15-2025

Guitar tablature transcription consists in deducing the string and the fret number on which each note should be played to reproduce the actual musical part. This assignment should lead to playable string-fret combinations throughout the entire track and, in general, preserve parsimonious motion between successive combinations. Throughout the history of guitar playing, specific chord fingerings have been developed across different musical styles that facilitate common idiomatic voicing combinations and motion between them. This paper presents a method for assigning guitar tablature notation to a given MIDI-based musical part (possibly consisting of multiple polyphonic tracks), i.e. no information about guitar-idiomatic expressional characteristics is involved (e.g. bending etc.) The current strategy is based on machine learning and requires a basic assumption about how much fingers can stretch on a fretboard; only standard 6-string guitar tuning is examined. The proposed method also examines the transcription of music pieces that was not meant to be played or could not possibly be played by a guitar (e.g. potentially a symphonic orchestra part), employing a rudimentary method for augmenting musical information and training/testing the system with artificial data. The results present interesting aspects about what the system can achieve when trained on the initial and augmented dataset, showing that the training with augmented data improves the performance even in simple, e.g. monophonic, cases. Results also indicate weaknesses and lead to useful conclusions about possible improvements.

evolutionary algorithm, machine learning, tablature, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.5281/zenodo.6822204

2510.10619

Country: Europe > Greece (0.14)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)

Add feedback

Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis

Tao, Leitian, Du, Xuefeng, Li, Sharon

arXiv.org Artificial IntelligenceOct-15-2025

Reward modeling, crucial for aligning large language models (LLMs) with human preferences, is often bottlenecked by the high cost of preference data. Existing textual data synthesis methods are computationally expensive. We propose a novel framework LENS for synthesizing preference data directly in the LLM's latent embedding space. Our method employs a Variational Autoencoder (VAE) to learn a structured latent representation of response embeddings. By performing controlled perturbations in this latent space and decoding back to the embedding space, we efficiently generate diverse, semantically consistent synthetic preference pairs, bypassing costly text generation and annotation. We provide theoretical guarantees that our synthesized pairs approximately preserve original preference ordering and improve reward model generalization. Empirically, our latent-space synthesis significantly outperforms text-based augmentation on standard benchmarks, achieving superior results while being 18x faster in generation and using a 16,000x smaller model. Our work offers a scalable and effective alternative for enhancing reward modeling through efficient data augmentation. Code is publicly available at https://github.com/deeplearning-wisc/lens

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.26074

Country: North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Dhati+: Fine-tuned Large Language Models for Arabic Subjectivity Evaluation

Bellaouar, Slimane, Nehar, Attia, Souffi, Soumia, Bouameur, Mounia

arXiv.org Artificial IntelligenceAug-28-2025

Despite its significance, Arabic, a linguistically rich and morphologically complex language, faces the challenge of being under-resourced. The scarcity of large annotated datasets hampers the development of accurate tools for subjectivity analysis in Arabic. Recent advances in deep learning and Transformers have proven highly effective for text classification in English and French. This paper proposes a new approach for subjectivity assessment in Arabic textual data. To address the dearth of specialized annotated datasets, we developed a comprehensive dataset, AraDhati+, by leveraging existing Arabic datasets and collections (ASTD, LABR, HARD, and SANAD). Subsequently, we fine-tuned state-of-the-art Arabic language models (XLM-RoBERTa, AraBERT, and ArabianGPT) on AraDhati+ for effective subjectivity classification. Furthermore, we experimented with an ensemble decision approach to harness the strengths of individual models. Our approach achieves a remarkable accuracy of 97.79\,\% for Arabic subjectivity classification. Results demonstrate the effectiveness of the proposed approach in addressing the challenges posed by limited resources in Arabic language processing.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.19966

Country:

Europe (1.00)
North America (0.68)
Africa > Middle East > Algeria (0.29)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)

Add feedback